Skip to content

Conversation

phofl
Copy link
Member

@phofl phofl commented Mar 17, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

the read-only pr changed this to _values which returns a DatetimeArray instead of a ndarray, which caused a performance regression

https://asv-runner.github.io/asv-collection/pandas/#io.csv.ReadCSVConcatDatetime.time_read_csv

@phofl phofl added the IO CSV read_csv, to_csv label Mar 17, 2023
@phofl phofl added this to the 2.0 milestone Mar 17, 2023
)._values
)
if isinstance(result, DatetimeIndex):
arr = result.to_numpy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we know it is timezone-naive here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no tz-aware.

I guess you are referring to result._values._ndarray? Tried this first, but breaks tests

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its its tzaware then to_numpy() should convert to object, which i havent checked but assume we dont want here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it does, but I am not too concerned by this since this is the same behavior as before. This keeps performance at least stable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to clarify my initial response: we can either be tz aware or naive, depends on the input

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel ok with merging? We should get this into 2.0

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK with me

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx

@phofl phofl added the Performance Memory or execution speed performance label Mar 29, 2023
@phofl phofl merged commit beec0e8 into pandas-dev:main Mar 29, 2023
meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 29, 2023
@phofl phofl deleted the perf_read_csv_datetime branch March 29, 2023 15:33
mroeschke pushed a commit that referenced this pull request Mar 29, 2023
…in read_csv when converting datetimes) (#52278)

Backport PR #52057: PERF: Fix performance regression in read_csv when converting datetimes

Co-authored-by: Patrick Hoefler <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO CSV read_csv, to_csv Performance Memory or execution speed performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants